Guest Lecture for MIT 18.5096 Topics in Mathematics with Applications in Finance
Jonathan Larkin
October 2, 2025
Disclaimer
This presentation is for informational purposes only and reflects my personal views and interests. It does not constitute investment advice and is not representative of any current or former employer. The information presented is based on publicly available sources. References to specific firms are for illustrative purposes only and do not imply endorsement.
About Me
Managing Director at Columbia Investment Management Co., LLC, generalist allocator, Data Science and Research lead. Formerly CIO at Quantopian, Global Head of Equities and Millennium Management LLC, and Co-Head of Equity Derivatives Trading at JPMorgan.
The Condorcet Jury Theorem states that if each member of a jury has a probability greater than 1/2 of making the correct decision, then as the number of jurors increases, the probability that the majority decision is correct approaches 1.
\[
P(\text{majority correct}) \to 1 \text{ as } n \to \infty \\
\iff \text{independence of errors}
\]
e.g., sklearn.ensemble.VotingClassifier relies on this result.
Boosting Weak Learners (1988)
Kearns, Michael. Thoughts on Hypothesis Boosting. 1988.
Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. 2001.
Sequentially train many “weak learner” models, each focusing on the errors of the previous ones.
Gradient boosted decision trees are the dominant approach in tabular machine learning still today.
Boosting in a Nutshell
The final model after M rounds is a weighted sum of weak models, \(h_m(x)\). \[
F_M(x) = \sum_{m=1}^M \gamma h_m(x)
\]
Each step fits a learner to residuals (or negative gradient).
\[
F_m(x) = F_{m-1}(x) + \gamma h_m(x)
\]
👉 Each new learner reduces the cumulative errors.
Model Stacking (1992)
Wolpert, David H. Stacked Generalization. 1992.
Train “meta-model” on the predictions of base models.
Works best when base models are diverse and capture different aspects of the data.
e.g., sklearn.ensemble.StackingClassifier
Stacking in a Nutshell
Combine multiple different models by training a new model on their predictions.
Step 1: Train base models (e.g. linear regression, tree, neural net).
Step 2: Collect their predictions on out-of-fold data.
Step 3: Train a meta-model on those predictions. \[
\hat{y} = g\big(f_1(x), f_2(x), \dots, f_K(x)\big)
\]
where \(f_k\) are base models, and \(g\) is the meta-model.
Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution, 22(6), 469–493.
Humans can maintain ≈150 stable relationships
Limit of trust & cohesion
Beyond → silos, slow decisions, culture strain
Dunbar cont’d: How Hedge Funds Manage It
Pods → small teams, central risk
Tech → scale with models, not people
Lean → cap size, preserve culture
Bureaucracy → heavy process to scale
👉 Hedge funds scale by respecting Dunbar or building around it.
Wisdom of Crowds (2004)
Surowiecki, James. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. Doubleday, 2004.
For the crowd to be smarter than experts, we require
Diversity of opinion
Independence of members
Decentralization
Aggregation of information
The Common Task Framework (2007-)
Donoho, D. (2017). “50 Years of Data Science.” Journal of Computational and Graphical Statistics, 26(4), 745–766.
Define a clear task (e.g., image recognition).
Provide dataset + ground truth labels + hidden test set.